home *** CD-ROM | disk | FTP | other *** search
-
- REGEX Globber (Wild Card Matching)
-
- A *IX SH style pattern matcher written in C
- V1.10 Dedicated to the Public Domain
-
- March 12, 1991
- J. Kercheval
- [72450,3702] -- johnk@wrq.com
-
-
-
-
- *IX SH style Regular Expressions
- ================================
-
- The *IX command SH is a working shell similar in feel to the MSDOS
- shell COMMAND.COM. In point of fact much of what we see in our
- familiar DOS PROMPT was gleaned from the early UNIX shells available
- for many of machines the people involved in the computing arena had
- at the time of the development of DOS and it's much maligned
- precursor CP/M (although the UNIX shells were and are much more
- flexible and powerful then those on the current flock of micro
- machines). The designers of DOS and CP/M did some fairly strange
- things with their command processor and OS. One of those things was
- to only selectively adopt the regular expressions allowed within the
- *IX shells. Only '?' and '*' were allowed in filenames and even with
- these the '*' was allowed only at the end of a pattern and in fact
- when used to specify the filename the '*' did not apply to extension.
- This gave rise to the all too common expression "*.*".
-
- REGEX Globber is a SH pattern matcher. This allows such
- specifications as *75.zip or * (equivelant to *.* in DOS lingo).
- Expressions such as [a-e]*t would fit the name "apple.crt" or
- "catspaw.bat" or "elegant". This allows considerably wider
- flexibility in file specification, general parsing or any other
- circumstance in which this type of pattern matching is wanted.
-
- A match would mean that the entire string TEXT is used up in matching
- the PATTERN and conversely the matched TEXT uses up the entire
- PATTERN.
-
- In the specified pattern string:
- `*' matches any sequence of characters (zero or more)
- `?' matches any character
- `\' suppresses syntactic significance of a special character
- [SET] matches any character in the specified set,
- [!SET] or [^SET] matches any character not in the specified set.
-
- A set is composed of characters or ranges; a range looks like
- 'character hyphen character' (as in 0-9 or A-Z). [0-9a-zA-Z_] is the
- minimal set of characters allowed in the [..] pattern construct.
- Other characters are allowed (ie. 8 bit characters) if your system
- will support them (it almost certainly will).
-
- To suppress the special syntactic significance of any of `[]*?!^-\',
- and match the character exactly, precede it with a `\'.
-
- To view several examples of good and bad patterns and text see the
- output of MATCHTST.BAT
-
-
-
- MATCH() and MATCHE()
- ====================
-
- The match module as written has two parsing routines, one is matche()
- and the other is match(). Since match() is a call to matche() which
- simply has its output mapped to a BOOLEAN value (ie TRUE if pattern
- matches or FALSE otherwise), I will concentrate my explanations here
- on matche().
-
- The purpose of matche() is to match a pattern against a string of
- text (usually a file name or specification). The match routine has
- extensive pattern validity checking built into it as part of the
- parser and allows for a robust pattern match.
-
- The parser gives an error code on return of type int. The error code
- will be one of the the following defined values (defined in match.h):
-
- MATCH_PATTERN - bad pattern or misformed pattern
- MATCH_LITERAL - match failed on character match (standard
- character)
- MATCH_RANGE - match failure on character range ([..] construct)
- MATCH_ABORT - premature end of text string (pattern longer
- than text string)
- MATCH_END - premature end of pattern string (text longer
- than pattern called for)
- MATCH_VALID - valid match using pattern
-
- The functions are declared as follows:
-
- BOOLEAN match (char *pattern, char *text);
-
- int matche(register char *pattern, register char *text);
-
-
-
- IS_VALID_PATTERN() and IS_PATTERN()
- ===================================
-
- There are two routines for determining properties of a pattern
- string. The first, is_pattern(), is designed simply to determine if
- some character exists within the text which is consistent with a SH
- regular expression (this function returns TRUE if so and FALSE if
- not). The second, is_valid_pattern() is designed to check the
- validity of a given pattern string (TRUE return if valid, FALSE if
- not). By 'validity', I mean well formed or syntactically correct.
-
- In addition, is_valid_pattern() has as one of it's parameters a
- return code for determining the type of error found in the pattern if
- one exists. The error codes are as follows and defined in match.h:
-
- PATTERN_VALID - pattern is well formed
- PATTERN_ESC - pattern has invalid literal escape ('\' at end of
- pattern)
- PATTERN_RANGE - [..] construct has a no end range in a '-' pair
- (ie [a-])
- PATTERN_CLOSE - [..] construct has no end bracket (ie [abc-g )
- PATTERN_EMPTY - [..] construct is empty (ie [])
-
- The functions are declared as follows:
-
- BOOLEAN is_valid_pattern (char *pattern, int *error_type);
-
- BOOLEAN is_pattern (char *pattern);
-
- ----------------end-of-author's-documentation---------------
-
- Software Library Information:
-
- This disk copy provided as a service of
-
- The Public (Software) Library
-
- We are not the authors of this program, nor are we associated
- with the author in any way other than as a distributor of the
- program in accordance with the author's terms of distribution.
-
- Please direct shareware payments and specific questions about
- this program to the author of the program, whose name appears
- elsewhere in this documentation. If you have trouble getting
- in touch with the author, we will do whatever we can to help
- you with your questions. All programs have been tested and do
- run. To report problems, please use the form that is in the
- file PROBLEM.DOC on many of our disks or in other written for-
- mat with screen printouts, if possible. The P(s)L cannot de-
- bug programs over the telephone.
-
- Disks in the P(s)L are updated monthly, so if you did not get
- this disk directly from the P(s)L, you should be aware that
- the files in this set may no longer be the current versions.
-
- For a copy of the latest monthly software library newsletter
- and a list of the 2,000+ disks in the library, call or write
-
- The Public (Software) Library
- P.O.Box 35705
- Houston, TX 77235-5705
- (713) 524-6394
-